multi-speaker phoneme recognition
Connectionist Architectures for Multi-Speaker Phoneme Recognition
We present a number of Time-Delay Neural Network (TDNN) based architectures for multi-speaker phoneme recognition (/b,d,g/ task). We use speech of two females and four males to compare the performance of the various architectures against a baseline recognition rate of 95.9% for a single IDNN on the six-speaker /b,d,g/ task. This series of modu(cid:173) lar designs leads to a highly modular multi-network architecture capable of performing the six-speaker recognition task at the speaker dependent rate of 98.4%. In addition to its high recognition rate, the so-called "Meta-Pi" architecture learns - without direct supervision - ognize the speech of one particular male speaker using internal models of other male speakers exclusively.
Connectionist Architectures for Multi-Speaker Phoneme Recognition
II, John B. Hampshire, Waibel, Alex
We present a number of Time-Delay Neural Network (TDNN) based architectures for multi-speaker phoneme recognition (/b,d,g/ task). We use speech of two females and four males to compare the performance of the various architectures against a baseline recognition rate of 95.9% for a single IDNN on the six-speaker /b,d,g/ task. This series of modular designs leads to a highly modular multi-network architecture capable of performing the six-speaker recognition task at the speaker dependent rate of 98.4%. In addition to its high recognition rate, the so-called "Meta-Pi" architecture learns - without direct supervision - to recognize the speech of one particular male speaker using internal models of other male speakers exclusively.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > District of Columbia > Washington (0.04)
Connectionist Architectures for Multi-Speaker Phoneme Recognition
II, John B. Hampshire, Waibel, Alex
We present a number of Time-Delay Neural Network (TDNN) based architectures for multi-speaker phoneme recognition (/b,d,g/ task). We use speech of two females and four males to compare the performance of the various architectures against a baseline recognition rate of 95.9% for a single IDNN on the six-speaker /b,d,g/ task. This series of modular designs leads to a highly modular multi-network architecture capable of performing the six-speaker recognition task at the speaker dependent rate of 98.4%. In addition to its high recognition rate, the so-called "Meta-Pi" architecture learns - without direct supervision - to recognize the speech of one particular male speaker using internal models of other male speakers exclusively.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > District of Columbia > Washington (0.04)
Connectionist Architectures for Multi-Speaker Phoneme Recognition
II, John B. Hampshire, Waibel, Alex
We present a number of Time-Delay Neural Network (TDNN) based architectures for multi-speaker phoneme recognition (/b,d,g/ task). We use speech of two females and four males to compare the performance of the various architectures against a baseline recognition rate of 95.9% for a single IDNN on the six-speaker /b,d,g/ task.